Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
translated by 谷歌翻译
Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.
translated by 谷歌翻译
To balance the annotation labor and the granularity of supervision, single-frame annotation has been introduced in temporal action localization. It provides a rough temporal location for an action but implicitly overstates the supervision from the annotated-frame during training, leading to the confusion between actions and backgrounds, i.e., action incompleteness and background false positives. To tackle the two challenges, in this work, we present the Snippet Classification model and the Dilation-Erosion module. In the Dilation-Erosion module, we expand the potential action segments with a loose criterion to alleviate the problem of action incompleteness and then remove the background from the potential action segments to alleviate the problem of action incompleteness. Relying on the single-frame annotation and the output of the snippet classification, the Dilation-Erosion module mines pseudo snippet-level ground-truth, hard backgrounds and evident backgrounds, which in turn further trains the Snippet Classification model. It forms a cyclic dependency. Furthermore, we propose a new embedding loss to aggregate the features of action instances with the same label and separate the features of actions from backgrounds. Experiments on THUMOS14 and ActivityNet 1.2 validate the effectiveness of the proposed method. Code has been made publicly available (https://github.com/LingJun123/single-frame-TAL).
translated by 谷歌翻译
In a citation graph, adjacent paper nodes share related scientific terms and topics. The graph thus conveys unique structure information of document-level relatedness that can be utilized in the paper summarization task, for exploring beyond the intra-document information. In this work, we focus on leveraging citation graphs to improve scientific paper extractive summarization under different settings. We first propose a Multi-granularity Unsupervised Summarization model (MUS) as a simple and low-cost solution to the task. MUS finetunes a pre-trained encoder model on the citation graph by link prediction tasks. Then, the abstract sentences are extracted from the corresponding paper considering multi-granularity information. Preliminary results demonstrate that citation graph is helpful even in a simple unsupervised framework. Motivated by this, we next propose a Graph-based Supervised Summarization model (GSS) to achieve more accurate results on the task when large-scale labeled data are available. Apart from employing the link prediction as an auxiliary task, GSS introduces a gated sentence encoder and a graph information fusion module to take advantage of the graph information to polish the sentence representation. Experiments on a public benchmark dataset show that MUS and GSS bring substantial improvements over the prior state-of-the-art model.
translated by 谷歌翻译
Retrieval-augmented Neural Machine Translation models have been successful in many translation scenarios. Different from previous works that make use of mutually similar but redundant translation memories~(TMs), we propose a new retrieval-augmented NMT to model contrastively retrieved translation memories that are holistically similar to the source sentence while individually contrastive to each other providing maximal information gains in three phases. First, in TM retrieval phase, we adopt a contrastive retrieval algorithm to avoid redundancy and uninformativeness of similar translation pieces. Second, in memory encoding stage, given a set of TMs we propose a novel Hierarchical Group Attention module to gather both local context of each TM and global context of the whole TM set. Finally, in training phase, a Multi-TM contrastive learning objective is introduced to learn salient feature of each TM with respect to target sentence. Experimental results show that our framework obtains improvements over strong baselines on the benchmark datasets.
translated by 谷歌翻译
尖峰神经网络(SNNS)模仿大脑计算策略,并在时空信息处理中表现出很大的功能。作为人类感知的基本因素,视觉关注是指生物视觉系统中显着区域的动态选择过程。尽管视觉注意力的机制在计算机视觉上取得了巨大成功,但很少会引入SNN中。受到预测注意重新映射的实验观察的启发,我们在这里提出了一种新的时空通道拟合注意力(SCTFA)模块,该模块可以通过使用历史积累的空间通道信息来指导SNN有效地捕获潜在的目标区域。通过在三个事件流数据集(DVS手势,SL-Animals-DVS和MNIST-DVS)上进行系统评估,我们证明了带有SCTFA模块(SCTFA-SNN)的SNN不仅显着超过了基线SNN(BL-SNN)(BL-SNN)(BL-SNN)以及其他两个具有退化注意力模块的SNN模型,但也通过现有最新方法实现了竞争精度。此外,我们的详细分析表明,所提出的SCTFA-SNN模型对噪声和出色的稳定性具有强大的稳健性,同时保持了可接受的复杂性和效率。总体而言,这些发现表明,适当纳入大脑的认知机制可能会提供一种有希望的方法来提高SNN的能力。
translated by 谷歌翻译
歌词到融合的生成是歌曲创作的重要任务,并且由于其独特的特征也很具有挑战性:产生的旋律不仅应遵循良好的音乐模式,而且还应与节奏和结构等歌词中的功能保持一致。由于几个问题,这些特征无法通过以端到端学习抒情式映射的神经生成模型来很好地处理:(1)缺乏对齐的抒情式摩托律训练数据,以充分学习抒情液特征结盟; (2)发电中缺乏可控性,无法明确保证抒情特征对齐。在本文中,我们提出了ROC,这是一种新的抒情术的范式,该范式通过一代网络式管道解决了上述问题。具体而言,我们的范式有两个阶段:(1)创建阶段,其中大量音乐是由基于神经的旋律语言模型生成的,并通过几个关键功能(例如和弦,音调,节奏和节奏和节奏)在数据库中索引。结构信息,包括合唱或经文); (2)重新创建阶段,根据歌词的关键功能从数据库中检索音乐作品,并根据构图指南和旋律语言模型分数从数据库中检索音乐作品来重新创建旋律。我们的ROC范式具有多个优点:(1)它只需要未配对的旋律数据来训练旋律语言模型,而不是以前模型中配对的抒情数据。 (2)它在抒情循环的生成中实现了良好的抒情式特征对齐。关于英语和中文数据集的实验表明,ROC在客观和主观指标上都优于先前基于神经的抒情性循环模型。
translated by 谷歌翻译
基于图形的模型最近在人的重新识别任务中取得了巨大的成功,该任务首先计算了不同人之间的图形拓扑结构(亲和力),然后将信息传递给他们的信息以实现更强的功能。但是,我们在可见的红外人员重新识别任务(VI-REID)中发现了现有的基于图的方法,因为有两个问题:1)火车测试模式平衡差距,这是VI-REID任务的属性。两个模式数据的数量在训练阶段平衡,但推理极为不平衡,导致基于图的VI-REID方法的概括较低。 2)由图形模块的端到端学习方式引起的亚最佳拓扑结构。我们分析训练有素的输入特征会削弱图形拓扑的学习,从而使其在推理过程中不够概括。在本文中,我们提出了一种反事实干预特征转移(CIFT)方法来解决这些问题。具体而言,均匀和异质的特征转移(H2FT)旨在通过两种独立的设计的图形模块和不平衡的场景模拟来减少火车测试模态差距。此外,提出了反事实关系干预(CRI)来利用反事实干预和因果效应工具来突出拓扑结构在整个训练过程中的作用,这使图形拓扑结构更加可靠。对标准VI-REID基准测试的广泛实验表明,CIFT在各种设置下都优于最新方法。
translated by 谷歌翻译
尽管基于深度学习的单眼行人检测方法取得了长足的进步,但它们仍然容易受到沉重的阻塞。使用多视图信息融合是一个潜在的解决方案,但由于缺乏注释的培训样本,因此应用程序有限,因此可以增加过度拟合的风险。为了解决这个问题,提出了一种数据增强方法,以随机生成3D圆柱体阻塞的地面平面,该缸的平均规模是行人的平均大小,并预测了多种视图,以减轻训练过度拟合的影响。此外,每个视图的特征映射都通过使用同符,将每个视图的特征图投影到不同高度的多个平行平面,这使CNN可以充分利用每个行人高度上的特征来推断地面上的行人位置。与最先进的基于深度学习的方法相比,提出的3Drom方法具有大大提高的性能。
translated by 谷歌翻译
在本报告中,我们建议针对四个EGO4D挑战任务,包括自然语言查询(NLQ),MOMMER QUERY(MQ),对象状态变更分类(OSCC),以及PNR定位(PNR)。尤其是,我们将最近发布的EGO4D数据集\ cite {grauman2021ego4d}从预处理数据集,预处理目标和开发集中从egecentric vlp中提升。基于上述三个设计,我们开发了一个验证的视频语言模型,该模型能够将其以自我为中心的视频文本表示或仅视频表示形式转移到几个视频下游任务中。我们的Egentric VLP在NLQ上实现10.46r@1&iou @0.3,MQ上的10.33地图,OSCC上的74%ACC,PNR上的0.67秒错误。该代码可在https://github.com/showlab/egovlp上找到。
translated by 谷歌翻译